Updates on Pacific Islands Country¶

By Charlie Zhang

EDA¶

Three types of data about Pacific Islands are available:

  • Available Seats Kilometers, a variable that measures the vacancy per flight. The equation is Available Seats * The number of kilometers, however, given there is no corresponding flight distace, the variable yields little value for further analysis.

  • Thus, Number of Flights in 7 Days and Number of Passengers in 7 Days become the main focus of this analysis.

Cross-country comparisons¶

Number of Flights in 7 Days¶

In [1]:
from IPython.display import Image
Image(filename='Viz/NumFlsIn7Days.png', height=800, width=600)
Out[1]:
In [2]:
Image(filename='Viz/NumberFlsIn7Days_freescale.png', 
      height=800, width=600) 
Out[2]:
In [3]:
from IPython.display import HTML
HTML(filename='Viz/fl.html')
Out[3]:
Bokeh Plot

Number of Passengers in 7 Days¶

In [4]:
HTML(filename='Viz/psg.html')
Out[4]:
Bokeh Plot

Statistical Properties¶

Missing Data¶

In [5]:
Image(filename='Viz/ms_heatmap.png', width=800) 
Out[5]:

Basically, from the heatmap displayed above, the missing columns psg_wow_change and fl_wow_change have exactly the same entry combinations missed.

Except for Fiji and Papua New Guinea, other countries get some proportion of the missing data, ranging from 11.38% in Tuvalu to 95.14% in Solomon Islands. Noted:

  • date_range measures difference between the first and last recorded dates;
  • df_length is the available dataframe's length;
  • missing counts the missing items in the available dataframe; and
  • available is the true aviliable counts over the recorded time periods.
In [7]:
import pandas as pd
missing = pd.read_csv("Output/missing_count.csv")
missing.drop("Unnamed: 0", axis=1).sort_values(by="ratio", ascending=False)
Out[7]:
destination_country missing df_length date_range available ratio
6 Solomon Islands 20 843 865 823 95.144509
5 Samoa 32 829 865 797 92.138728
7 Tonga 41 822 865 781 90.289017
9 Vanuatu 19 756 836 737 88.157895
2 Micronesia (Federated States of) 105 736 865 631 72.947977
4 Palau 101 625 865 524 60.578035
3 Nauru 166 668 865 502 58.034682
1 Marshall Islands (the) 126 586 865 460 53.179191
0 Kiribati 150 294 864 144 16.666667
8 Tuvalu 51 144 817 93 11.383109

In line with missing data from the Covid Stringency Index, Fiji, Papua New Guinea, Tonga, and Solomon Islands are the countries with least missing data. A detailed count are displayed below:

  • 1 out of 511 in TON is null.
  • 6 out of 701 in SLB is null.
  • 14 out of 957 in FJI is null.
  • 14 out of 907 in PNG is null.
  • 21 out of 483 in KIR is null.
  • 21 out of 672 in VUT is null.
  • 451 out of 451 in TUV is null.
  • 387 out of 387 in PLW is null.
  • 460 out of 460 in NRU is null.
  • 859 out of 859 in MHL is null.
  • 600 out of 600 in FSM is null.
  • 664 out of 664 in WSM is null.

Descriptive Statistics¶

Covid-Cutoff¶

In [8]:
covid_compare = pd.read_csv("Output/covid_cutoff_stats.csv")
covid_compare
Out[8]:
Unnamed: 0 flights_number_7days_before flights_number_7days_after flights_7days_change (in %) passengers_number_7days_before passengers_number_7days_after passengers_7days_change (in %)
0 Vanuatu 26.792373 29.347409 9.536430 3028.190678 2553.804223 -15.665673
1 Solomon Islands 7.199248 2.636678 -63.375645 935.939850 386.366782 -58.718845
2 Papua New Guinea 454.928571 465.118774 2.239957 34249.642857 32915.454662 -3.895481
3 Fiji 287.924051 67.417513 -76.584967 26749.341772 6948.908629 -74.022132

Variability¶

One way to detect the variability is to use z-score to measure how far it deviates from the mean.

In [9]:
import numpy as np
zscore = pd.read_csv("Output/z-score.csv").drop("Unnamed: 0", axis=1)
zscore.groupby("destination_country").apply(np.max, axis=0)
Out[9]:
destination_country date fl_zscore psg_zscore
destination_country
Fiji Fiji 2022-05-15 3.559036 3.488758
Kiribati Kiribati 2022-05-14 3.313215 3.223353
Marshall Islands (the) Marshall Islands (the) 2022-05-15 3.379459 3.388422
Micronesia (Federated States of) Micronesia (Federated States of) 2022-05-15 4.417831 5.061285
Nauru Nauru 2022-05-15 3.554112 3.617700
Palau Palau 2022-05-15 3.760119 3.961421
Papua New Guinea Papua New Guinea 2022-05-15 1.581494 1.698900
Samoa Samoa 2022-05-15 4.543945 4.544906
Solomon Islands Solomon Islands 2022-05-15 3.595390 3.892513
Tonga Tonga 2022-05-15 5.722493 4.538809
Tuvalu Tuvalu 2022-03-28 2.039153 2.039153
Vanuatu Vanuatu 2022-04-16 2.933568 3.356023

Prospects¶

Other Data Sources¶

IMF Tourism Tracker provides an estimated visitor during 2020-2021. Their method is quoted as below:

Chinese visitors to Fiji fell by 73 percent in February relative to a year earlier. And Chinese visitors to Palau accounted for 32 percent of total visitors in 2019. Multiplying the two percentages yields the percentage point contribution to the change in visitors to Palau from Chinese visitors. Adding up the contributions across all source countries yields the total 12-month percent change.

Single Variable Time-Series Forecasting¶

SARIMA (Seasonal Autoregressive Integrated Moving Average), which could absorb the seasonal patterns (weekly, mouthly, quarterly, and yearly), might be the most useful model for us to explore. Covid factors need to be considered. A potential way to bypass that is to directly use the post-covid flight and passenger data.

Interrupted Time Series (ITS)/Regression Discontinunity Design (RDD) could be especially helpful to detec the treatment effects of the Covid-19.

In [11]:
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

model = smf.wls("flights_number_7days ~ threshold + week + week * threshold", its_df).fit()
its_df["fitted"] = model.fittedvalues
model.summary().tables[1]
Out[11]:
coef std err t P>|t| [0.025 0.975]
Intercept 354.3645 8.290 42.745 0.000 338.093 370.636
threshold -347.8326 8.940 -38.909 0.000 -365.379 -330.287
week -13.0165 1.277 -10.196 0.000 -15.522 -10.511
week:threshold 13.9039 1.277 10.884 0.000 11.397 16.411
In [12]:
sns.set(rc={'figure.figsize': (10, 8)})
sns.set_style("whitegrid")
sns.scatterplot(x="date", y="flights_number_7days", data=its_df, color="green")
sns.lineplot(x="date", y="fitted", data=its_df,
             color="red", label="prediction")
plt.xlabel("Date")
plt.ylabel("Flights Number in 7 Days")
plt.title("Fiji - RDD Setup")
Out[12]:
Text(0.5, 1.0, 'Fiji - RDD Setup')

Vector Autoregressive Approach (applicable to flight & passenger data)¶

Related to IMF's model, a country represented by $c$ at time $t$'s Flights $Y_{ct}$ could be seen as a linear regression of other countries (probably Fiji, Papua New Guinea, and Solomon Islands):
$$Y_{ct} = \beta_{1} \text{Fiji}_{t} + \beta_2 \text{Papua New Guinea}_{t} + \beta_3 \text {Soloman Islands}_{t} + \epsilon \text{ for t} = 1,2,..., n $$ Then, it depends on $R^{2}$ (how much variance that we could explain) to measure the validity. What we can do is to check the granger's causality